Skip to main content
TrustRadius
Amazon EMR

Amazon EMR

Overview

What is Amazon EMR?

Amazon EMR is a cloud-native big data platform for processing vast amounts of data quickly, at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the scalability…

Read more
Recent Reviews

Amazon EMR Review

7 out of 10
September 22, 2020
Incentivized
Amazon EMR is being used by our organization to simplify running big data frameworks, and provide the Amazon EMR highlights, product …
Continue reading
Read all reviews

Awards

Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards

Return to navigation

Product Details

What is Amazon EMR?

Amazon EMR Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Amazon EMR is a cloud-native big data platform for processing vast amounts of data quickly, at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the scalability of Amazon EC2 and scalable storage of Amazon S3, EMR gives analytical teams the engines and elasticity to run Petabyte-scale analysis.

Reviewers rate Support Rating highest, with a score of 9.

The most common users of Amazon EMR are from Mid-sized Companies (51-1,000 employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(60)

Attribute Ratings

Reviews

(1-12 of 12)
Companies can't remove reviews or game the system. Here's why
Score 8 out of 10
Vetted Review
Verified User
Incentivized
The AWS stack is a big component of the majority of our work. When necessary, EMR is employed in a number of these settings. When we need to process a large amount of data across several EC2 servers, our DevOps team implements it. For our customers, EMR is attractive since it is far less expensive to adopt than alternative solutions, which means that the overall cost savings are substantial.
  • Faster than prior on-premise systems to put in place.
  • Open source software is supported.
  • Reduces the cost of production.
  • Automation of processing jobs creation and deletion.
  • The cost of this service is more expensive than similar ones.
  • Getting everything up and running at the beginning is a lengthy process.
You can use Amazon EMR if you wish to shift to the cloud and save money by using Apache Spark or Apache Hadoop on-premises. When the amount of work you have to handle data fluctuates a lot. Setting up flexible and scalable scenarios with AWS's EMR can assist you.
Score 7 out of 10
Vetted Review
Verified User
Incentivized
We use Amazon EMR (Elastic MapReduce) to run various types of algorithms related to health like calculation of body mass index, heart rate and similar parameters on vast amounts of data. We do this for developing a prototype of a health analysis device that users can wear on their body - something like a smart watch fitness tracker.
  • They have excellent tech support
  • Reduced processing times
  • Easy to configure
  • Pricing should be better
  • User Interface should be more attractive
  • Faster ramp up
Scenarios where it is good:
1. Where speed is important, and there is a vast amount of data to process
2. Configuration setup needs to be fast

Scenarios where it is not good:
1. For small companies which do not have enough money
2. For one-off uses, since the ramp up curve is high
Score 8 out of 10
Vetted Review
Verified User
Incentivized
For some clients, we have our product hosted on several AWS products, and when it comes to retrieving big volumes of data we use the Amazon EMR service. It has aided us in becoming more productive and saving time and effort. AWS is our go-to service for most of our needs,
  • very easy to configure
  • easy to manage large amounts of data
  • very quick in executing transformations
  • not a recommended service to mange smaller amounts of data
  • expensive
  • not the best user interface out there
Our teams prefer using this service to deploy because it is simple to configure and scale even though it can be expensive at times. It also needs some training for new users to get familiar with all the functions and features. Experience matters a lot while using this platform.
José David Rodríguez Gómez | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
On request transitory clusters for huge information handling. I like its accessibility completely different taken a toll tire makes it greatly flexible for distinctive scale clients. Can be pre-installed with any Huge information apparatuses like Hive, Start, Pig, etc. Nitty-gritty cluster observing makes a difference to track a few measurements, in turn, makes a difference to diminish fetched.
  • Big data processing.
  • The resizing feature is good.
  • Ease of use and creating new clusters.
  • The user interface could use a facelift.
  • Overhead delay in starting clusters.
  • Big learning curve for someone who hasn't used a program like this before.
We are running it to perform preparation which takes a few hours on EC2 to be running on a spark-based EMR cluster to total the preparation inside minutes rather than a few hours. Ease of utilization and capacity to select from either Hadoop or spark. Processing time diminishes from 5-8 hours to 25-30 minutes compared with the Ec2 occurrence and more in a few cases.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Having our product hosted on various AWS EC2 instances for some clients and when it requires pulling large amounts of data and performing large transformations using client data, we would use the amazon EMR service to get that work done. The usage is limited to a few clients of our product rather than the entire client base.
  • Manage large database well.
  • Usage Monitoring.
  • Quick
  • Cost effective.
  • Little bit complex with setting it up.
  • Could be costly of not configured well.
  • Typical to manage.
If someone is playing with a large dataset over AWS it's worth using it for a small kind of dataset it doesn't make sense as it's complex to manage.
Score 10 out of 10
Vetted Review
Verified User
Incentivized
We migrated the entire hadoop structure to Amazon EMR, the cost and maintenance are much better compared to other solutions on the market. We created a recommender system filter in big data. We needed a low runtime to meet our demand and we were able to get through the Amazon EMR.We migrated the entire hadoop structure to Amazon EMR, the cost and maintenance are much better compared to other solutions on the market. We have a lot of data science tasks, like calculating statistics between various math calculations to apply the business rules. Definitely one of the best services to work on bigdata.
  • Faster processing.
  • The distributed computation of the calculations.
  • Easy to setup.
  • Monitoring as an add up.
  • Can be integrated with lots of technologies.
  • Overhead delay in starting clusters which can cause problems.
It provides a nice graphical user interface to manage and work with big data map reduction tasks instead of manual configuration with hadoop or cli.it saves a lot of time and effort.We create big data monitoring system filters.

It provides a good GUI to manage and handle big data map reduction tasks and its configuration saves a lot of time and effort.
April 06, 2022

AWS has it all!

Jonathan Brotto | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
To keep my review simple it is very convenient that AWS has a MapReduce tool as it was easy to deploy and test with our cloud setup. Also with AWS being well known it is easy to find staff who can use and set up a system and scale our solutions. Definitely an industry leader.
  • Scalable
  • Flexible
  • Good documentation
  • Cost effective
  • Integration with ERP for SMEs.
  • To connect to non cloud solutions and replicate data for backup.
  • Better performance metrics for business people such as cost benefits.
When I need to process large data and meaningful information. But it is very flexible where I can scale based on the data size and how I want to analyze it. But still can improve for nontechnical users as there is some jargon to learn to get the most out of the solution.
Nicolas Costa Ossa | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
We are a certified AWS partner agency, and we use a lot of the AWS stack for most of our projects. EMR is used in several of them when required. It is implemented by our DevOps team and we pretty much use it when we need to process a lot of data throughout EC2 instances. EMR is very compelling to our customers because it is easier to implement (hence less dev cost) and it is way more efficient when managing the data VS other tools, so the overall cost reduction is considerable.
  • Easier to implement than older on-premise solutions
  • Works with open source technologies.
  • Keeps processing cost low.
  • It is flexible and works also for short term workloads and the pricing changes to that model.
  • You definitely need to be trained before using it because the interface can be a little confusing. It is a professional service model, so I recommend a certified dev.
For example, when you have Apache Spark on-premise deployments, or also Apache Hadoop, and you want to move to the cloud and reduce costs, EMR is the right tool. When you have lots of ups and downs in workload levels to process data. AWS's EMR can help you by setting up flexible/scalable scenarios.
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Amazon Eliastic MapReduce may be a mouthful (EMR is much easier to say) but like taking that string and reducing it to its acronym, it takes a complex set of data and reduces to something manageable and understandable. Its been deployed as a solution to massive, and spread out data that needs to be consolidated.
  • Makes massive data easier to manage
  • Backed by Amazon and AWS
  • Makings analyzing data easy
  • Support more data frameworks
  • API integration
  • Cloud service integration
Amazon is the big player in the data game right now as it even seems to push Google out of the way in some instances. Because of this you know they treat your data well and also deal with a ton themselves. That makes them good at a comparably smaller data set like most companies have.
Thomas Young | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Amazon Elastic MapReduce is used by my department to produce big data analytics for certain clients. The software address data mining and predictive analytics for data sets that take a long time to process. The software is not used for econometric or other analytical evaluation because the size of the data sets does not lend themselves to such analysis. The software is used almost exclusively for data mining and simple reporting for large data cases.
  • Amazon Elastic MapReduce works well for managing analyses that use multiple tools, such as Hadoop and Spark. If it were not for the fact that we use multiple tools, there would be less need for MapReduce.
  • MapReduce is always on. I've never had a problem getting data analyses to run on the system. It's simple to set up data mining projects.
  • Amazon Elastic MapReduce has no problems dealing with very large data sets. It processes them just fine. With that said, the outputs don't come instantaneously. It takes time.
  • The analytical processes generally run quicker with the standalone tools of Hadoop, Spark, and others. If you only use one big data tool and don't really need things simplified, then Elastic MapReduce is more of an overhead tool that doesn't add much value.
  • The analytical capabilities of Elastic MapReduce are nowhere near as complex or broad as non-big data tools. I would suggest not using the tool unless your data really is big data.
  • The machine learning capabilities of Elastic MapReduce (using the big data tools of Hadoop/Spark) are good but are not as easy to use as other machine learning tools.
Amazon Elastic MapReduce is useful in cases where two conditions are met. First, that you are planning on using multiple big data tools simultaneously to analyze big data sets. And second, that you need a tool that simplifies managing big data tools. If these two conditions are met, MapReduce does a great job. The user interface is simple. The program eliminates some programming requirements. The software also makes setting up big data analyses much easier. With these benefits acknowledged, MapReduce is not a good tool for "small" data analyses, given that there are other tools that do the job quicker and much more professional output. If you're on the fence, try out MapReduce with competing "small" data tools and see if you really need big data software.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
We use Amazon EMR for big data storage and processing. It's cluster architecture with each department having different clusters. It's great for processing and storage of large volumes of data, specifically, the data which is unstructured and generates very rapidly, like network logs.
  • Distributed computing
  • Fault tolerant
  • Uptime
  • Providing user friendly tools for hdfs access
  • More simpler apis for easy access and processsing
  • Memory requirenent
If you don't have big data ..i.e petabytes of data with terabytes of data generating every day, then don't use Hadoop. Relational databases are enough for terabytes of data. Hadoop is not well suited for transactional systems or data.
October 25, 2017

AWS EMR at a glance!!

Score 7 out of 10
Vetted Review
Verified User
Incentivized
We have used AWS EMR before starting to use Databricks on EC2 instances. EMR was solving the problem but we needed a better solution (Enterprise edition) to manage our Workbooks and better scheduler for running or jobs. EMR was working fine but we did not find it user friendly to add the data nodes on demand. We used EMR primarily to process the data on AWS S3 using Hadoop and Spark frameworks. We have also used AWS SWF to orchestrate our job flow by adding steps. It was used widely by the data processing team and not by the entire organization as most of the data was on local servers. It addresses problems like processing data which might not need to be processed live as the cluster can be spun up and shut down once the job is completed. It is cost efficient (especially if you do not need data nodes and only task nodes), scalable and reliable.
  • EMR does well in managing the cost as it uses the task node cores to process the data and these instances are cheaper when the data is stored on s3. It is really cost efficient. No need to maintain any libraries to connect to AWS resources.
  • EMR is highly available, secure and easy to launch. No much hassle in launching the cluster (Simple and easy).
  • EMR manages the big data frameworks which the developer need not worry (no need to maintain the memory and framework settings) about the framework settings. It's all setup on launch time. The bootstrapping feature is great.
  • Sometimes bootstrapping certain tools comes with debugging costs. The tools provided by some of the enterprise editions are great compared to EMR.
  • Like some of the enterprise editions EMR does not provide on premises options.
  • No UI client for saving the workbooks or code snippets. Everything has to go through submitting process. Not really convenient for tracking the job as well.
EMR is suited if the jobs are long running and doesn't really need much monitoring. EMR is really flexible in processing the data on s3 as a developer doesn't need to spend time on debugging the connections to s3 from a big data framework as most of the configuration is taken care of by Amazon. Very cheap when compared to most of the solutions on the market and the ready to go configuration at the launch time reduces the amount of time required for admin tasks. So, considering the cheap cost, processing options on s3 and scalability via adding task nodes, EMR serves a better purpose for startups considering open source and cost efficient options.

However, EMR comes with its own disadvantages. There is no proper UI to track real time jobs which is however possible with Enterprise editions like Cloudera, Hortonworks etc. EMR could provide an interface to add workbooks and code snippets in the cluster as it would reduce the time to submit the tasks. EMR also lags the potential to automatically replace unhealthy nodes.
Return to navigation